Efficient video skimmer

ABSTRACT

Disclosed are a system, method, apparatus, and computer readable media containing instructions for displaying video files for rapid searching. In two different types of exemplary embodiments, a standalone video skimming system, and a video skimming system includes a server and a client system are disclosed, where the video file may be locally or remotely stored, or can be obtained from a live feed. The system displays many small windows simultaneously, in which different parts of the video chosen by the user are shown at the same time to shorten the skimming time. The video file is encoded using layered encoding to display smaller versions using lower layers, and without needing any processing to generate smaller versions of the video from the original full screen version. A video extractor is described for extracting the necessary bitstreams from a local video database containing layered encoded video files according to user specified window sizes, and distributing the signals over the electronic communications network channel. The system also includes a skimming control logic which can receive control commands from clients and invoke the video extractor to extract appropriate audio-visual signals there from for each command.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. ProvisionalApplication Ser. No. 61,172,355, filed Apr. 24, 2009, which is herebyincorporated by reference herein in its entirety.

BACKGROUND

1. Technical Field

The disclosed invention relates to techniques for searching for contentin a compressed digital video file accessed from local storage or over anetwork such as the Internet. In particular, it relates to the use oflayered video coding technology in connection with content searching forretrieving and displaying selected video segments.

2. Background Art

Subject matter related to the present application can be found inco-pending U.S. patent application Ser. Nos. 12/015,956, filed Jan. 17,2008 and entitled “System And Method For Scalable And Low-DelayVideoconferencing Using Scalable Video Coding,” 11/608,776, filed Dec.8, 2006 and entitled “Systems And Methods For Error Resilience AndRandom Access In Video Communication Systems,” and 11/682,263, filedMar. 5, 2007 and entitled “System And Method For Providing ErrorResilience, Random Access And Rate Control In Scalable VideoCommunications,” and U.S. Pat. No. 7,593,032, filed Jan. 17, 2008 andentitled “System And Method For A Conference Server Architecture For LowDelay And Distributed Conferencing Applications,” each of which ishereby incorporated by reference herein in their entireties.

With increasing computing power and electronic storage capacity, andubiquity of network bandwidth, the number of large digital videolibraries accessible through the Internet is growing rapidly. On the lowend of the performance spectrum, databases of popular websites such asYouTube (www.youtube.com) and Facebook (www.facebook.com) are becomingsources for millions of user videos. Currently, these videos are oftenof small size and low resolution. Considering the continuous increase inthe resolutions of consumer video equipment, however, such databases arelikely to contain higher resolution videos in the foreseeable future.

On the other end of the performance spectrum, high resolution videopresentation technologies such as High-Definition TV (HDTV) arefrequently used in entertainment and news video files. Even afterapplying digital video compression techniques, high resolution videoresults in large file sizes. Movies and broadcast TV content in standardTV (SDTV) resolution are still of considerable size. Two other importantapplications where higher resolution video may be used are surveillancevideo and video segments recorded for scientific experiments.

With the amount of digital video content growing, and that content beingspread around the Internet, a technology is needed to search for contentin these databases effectively and rapidly. Searching for content invideo files has significant applications. Users of such video contentsearching technology can, for example, include: police officers lookingfor a specific scene in surveillance videos (content which can bedistributed over many sites on the Internet, and can be many hours ordays long); students and teachers looking for a specific presentationmodule; biologists looking for the instance when a particular biologicalevent is triggered; or consumers of movies or news looking for aspecific movie scene or news segment. Many users may be interested onlyin a small portion—perhaps only few seconds of content—of a full-lengthvideo. A noteworthy aspect of content searching in video files is thatit normally requires a user's full attention and can't be performed as abackground task while multi-tasking.

The following terms are used throughout the disclosure. A “full lengthvideo” can be any video footage that is meaningful as a unit, forexample, a movie, a TV show, and similar video. A full length video maybe divided into segments, which are called “video chapters”, or inshort, “chapters”. Therefore, a full length video includes one or moreconcatenated chapters. Meta information related to a chapter is called“video index”, and the process by which video indices are generated iscalled “video indexing”. A video index may include information about thestarting and ending times of the chapter, textual information about thechapter's content, and one or more images derived from the chapter thatmay represents its content. Video chapters may be indexed. One simpleform of automatic indexing is to sub-divide a full length video intovideo chapters of a length that may be configured in the system orspecified by the user.

A “raw video” represents uncompressed digitized video. After processingby a video encoder/compressor, raw video becomes “compressed video”. Theterm “transcoding” refers to the process of converting a compressedvideo into a different type of compressed video. Transcoding caninvolve, for example, the transforming of compressed video into rawvideo, and compressing this raw video into a different type ofcompressed video.

“Skimming” video, alternatively known as browsing, has been a technicalchallenge for a long time. There are some techniques commonly used tomake skimming a full length video a more efficient process:

a. Fast forwarding: fast forwarding (also known as increasing the videoplayback speed) shortens the video viewing time. However, speeding upthe video rate distorts the video information and may cause eliminationof short events. This method has been the most popular browsingtechnique so far. Fast forwarding is discussed in more detail below.

b. Text Based Queries: This refers to a querying of metadata associatedwith the full length video or video chapters for specific textualinformation. For example, a text based query may be in the form of“scene with George falling off the bridge”. Text based queries todayrequire the video to be annotated, mostly a manual process, before thevideo can be queried. Although text-based video query has been inexistence for a long time, only few applications can afford the requiredintense human effort needed to intelligently categorize and annotate thevideos. One example of video content that contains metadata whichenables text based queries is medical records used in some systems.

c. Automatic Indexing: In the academic literature [for example, Cees G.M. Snoek and Marcel Worring, “Multimodal Video Indexing: A Review of theState-of-the-art,” Multimedia Tools and Applications, Volume 25, Number1/January, 2005, Springer], techniques have been proposed toautomatically index video for browsing representations based oninformation within the video. These indexing systems can use, forexample, any of the following information aspects to generate videochapters:

-   -   Motion of the video;    -   Scene changes;    -   Image statistics—such as color and shape;    -   Audio information; and/or    -   Specific object types in the video.

The prior art, for example, Michael A. Smith, “Video Skimming andCharacterization through the Combination of Image and LanguageUnderstanding Techniques,” p.775, 1997 IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR '97), 1997, disclosestechniques based on one or more of the above aspects that generate aso-called “video skim”, a short synopsis version (or a smallunderstandable portion) of the original indexed video. By viewing thevideo skim, a user may obtain an overview of the content of the fulllength video in a comparatively short period of time. A crude form of avideo skim can, for example, consist of images or short scenesrepresenting video chapters of a fixed duration. More complex skims maybe generated using previously available text based annotation orautomatic indexing techniques as mentioned above.

If a user becomes interested in one part of the skim, he/she can opt toview the associated video chapter in its relevant parts or in itsentirety, and possibly at normal playback speed, without having to viewthe rest of the full length video. The critical aspect of creating agood video skim is context understanding, which is the key to choosingthe significant images and words that should be included in the videoskim. Today, when using any form of automated video skim generation, itis unfortunately quite frequent that a certain scene, in which a usermay be interested, stays unidentified by the skimming process.Nevertheless, with advances in automated indexing techniques, videoskims are becoming a useful reality today, and will likely be of evenhigher interest in the foreseeable future.

In summary, the automated context-sensitive generation of video skims,despite the significant research conducted over the past decade, hasremained a task that is difficult, requiring high computationalcomplexity and involving human interaction such as filtering andprocessing.

Once a video skim is generated, it often needs to be presented to theuser.

In its crudest form, video skim generation and presentation can beperformed simultaneously based on a full length video file that isavailable on the user premises. A software application running on theuser's computer (or special purpose workstation), or dedicatedhardware-based processing, can use any of the techniques mentionedbelow.

A) Video Skimming Using Raw Video: If the full length video on the localstorage is in a raw format, then the skimming can be performed withoutthe dedicated creation of a video skim, by simply “fast forwarding”until the desired segment is found. The fast forwarding process can bedefined as “temporal sub-sampling”, or time domain sub-sampling of theframes of the video (e.g., skip every other frame for a playbackspeed-up of two (i.e., a 50% reduction in playback time), or show everyfifth frame for a speed-up of five (i.e., a roughly 76% reduction inplayback time), etc.). For example, referring to FIG. 1, a full-lengthvideo (101) contains even numbered (102, 104) and odd numbered (103,105) frames. Using temporal sub-sampling a temporal sub-sampled sequence(106) is created, in which only the even-numbered frames of the fulllength video sequence are present (107, 108). This temporal sub-samplingresults in a 50% reduction in the playback time. The speed-up can bevaried by other sub-sampling intervals. When sub-sampling every fifthframe (110, 111) of the original full length video sequence is used inthe temporal sub-sampled sequence (109), the reduction in playback timeis rough 76%.

Other linear or non-linear sub-sampling factors can also be used.

There are at least three disadvantages of fast forwarding are asfollows:

(1) The search may still take a long time depending on where thespecific video segment of interest is located within the full lengthvideo sequence (particularly if it is located towards the end).

(2) The video segment of interest may be made unnoticeable or totallylost during sub-sampling as it may fall on the deleted frames(especially when large sub-sampling intervals are in use).

(3) The associated audio information, if any, often cannot bemeaningfully presented.

Another method for skimming raw video is to subdivide the full lengthraw video into video chapters (which may be of preconfigured lengths orconfigured in real-time by the user), and allow the user to view morethan one video chapter in parallel using separate small windows for eachchapter. This process may require “spatial sub-sampling” to reduce theresolution of the original video to fit into smaller windows, because ofdisplay size limitations as illustrated in FIG. 2. The full length video(201) may include pictures of a certain spatial resolution, illustratedhere by the spatial size of one of the pictures in this sequence (202).In one very simple form, spatial sub-sampling may be, for example,averaging the brightness and/or color component values of four spatiallyadjacent pixels (203) so as to create a single sub-sampled pixel (204).Doing so for all pixels of all pictures of the full-length video (201)yields a series of sub-sampled pictures (205) at the same frame rate asthe full-length video, but at half of the spatial resolution in eachdimension.

There are at least two specific disadvantages of this approach are asfollows:

(1) Performing spatial sub-sampling in real-time to generate smallerversions of the full-length video is computationally intensive and timeconsuming. Depending on how many windows are generated and the size ofeach window, the sub-sampling may require significant computingresources.

(2) The information may be lost during sub-sampling due to side effectsof spatial sub-sampling such as filtering or aliasing.

B) Video Skimming Using Compressed Video: If the full length video is ina compressed format (for example, the full length video is compliantwith video compressions standards such as ITU-T Rec. H.264 or othervideo compression standards), then additional factors come into play.Using a compressed format eliminates the need for the very largeuncompressed video files and is, therefore, in most cases, advantageous.However, the compressed video file can't be temporally sub-sampledrandomly as the sequence of compressed frames may depend on other framesdue to inter-picture prediction. Unless the entire video is decoded,only independently decodable reference (IDR) frames can be used in fastforwarding. If there are no IDR frames or if their frequency is low,then fast forwarding will not be feasible without decoding a largepercentage of the coded pictures of the full length video sequence. Oneway of remedying this may be to transcode the compressed video with morefrequent IDR frames. The disadvantages of this approach are as follows:

(1) It may be time consuming and/or computationally expensive.

(2) With an increase of the number of IDR frames, the compression ratiodecreases. The transcoded full length sequence with a higher number ofIDR frames may be significantly larger than the original compressed fulllength sequence.

(3) The disadvantages of fast forwarding with raw files still remain.

Another method for skimming compressed video is to subdivide thecompressed video into video chapters and view each segment in parallelin separate small windows (as described for skimming raw video). Theprocess of sub-dividing a compressed file suffers from similardisadvantages as temporal sub-sampling. Further using traditional videocompression technologies, spatial sub-sampling is not possible in thecompressed domain. In order to generate the required spatially smallervideo sequences, a transcoding step may be necessary, with the spatialsub-sampling being performed after the decompression of the originalvideo data and before the compression. Moreover, although use ofcompressed video file eliminates the disadvantage of storing a largefile, the need to decode the file several times in real-time introducessignificant additional cost and processing complexity to spatialre-sampling.

If the full length video file, be it in raw or compressed format, is notavailable locally, the problem of video skimming according to thedescribed techniques is further exacerbated by the need to retrieve itin real-time to a local computer over a network like the publicInternet. Particularly, if the file is in raw format, then the bandwidthrequirements are impractically large (i.e., 45 Mbps for a reasonablespeed download of an SDTV resolution sequence). Accordingly, given theissues of using raw and compressed video, and using temporal and spatialsub-sampling, there has not been an acceptable implementation of apractical real-time video skimmer in the market place.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates temporal sub-sampling existing in the prior art.

FIG. 2 illustrates spatial sub-sampling existing in prior art.

FIG. 3 is an exemplary video display screen of the video skimmer inaccordance with the present invention.

FIG. 4 is a block diagram illustrating an exemplary, system of astandalone video skimmer in accordance with the present invention.

FIG. 5 is a block diagram illustrating an exemplary system with aclient-server video skimmer server architecture in accordance with thepresent invention.

FIG. 6 is a block diagram illustrating an exemplary standalone videoskimmer in accordance with the present invention.

FIG. 7 is a block diagram illustrating an exemplary client-server videoskimmer in accordance with the present invention.

FIG. 8 is an exemplary method flow for controlling MBW configurations inaccordance with the present invention.

FIG. 9 is an exemplary method flow chart for messaging between skimmingcontrol logic client and server in accordance with the presentinvention.

Throughout the drawings, the same reference numerals and characters,unless otherwise stated, are used to denote like features, elements,components or portions of the illustrated embodiments. Moreover, whilethe disclosed invention will now be described in detail with referenceto the figures, it is done so in connection with the illustrativeembodiments.

DESCRIPTION

A video skimmer, according to the present invention, is a system whichimplements an approach of displaying multiple chapters of the fulllength video file, that may appear to the user as if it were spatiallyand/or temporally sub-sampled, simultaneously, by using a video filethat has been compressed using a layered (also known as scalable)encoder. The video file may be indexed or un-indexed. According to theinvention, no transcoding, or sub-sampling in either temporal or spatialdimension may be required in order to enable the skimming process. Thesystem works efficiently when the full length video file is availablelocally, or remotely and accessible only over a network, for example theInternet.

According to the invention, the video may be compressed using a layeredcodec, such as the one disclosed in ITU-T Recommendation H.264 Annex G(also known as SVC). In order to take full advantage of the invention,the scalable video bitstream that is stored, among other things, in thefull length video file, should contain at least one low resolutionversion of the video content, advantageously as a base layer. The lowresolution can be stored in the form of a base layer and one or moreenhancement layers; however, the mentioned combination of base and/orenhancement layers, after decoding, still results in the low resolution.The resolution can be chosen such that it is suitable, after decoding,for displaying in a mini browsing window (MBW) of the video skimmerdisplay. An MBW can be smaller in spatial size than a full window, whichcan be optimized to view the full resolution video. Full resolutionvideo may be obtained by decoding a base layer and at least oneenhancement layer more than required for the lower resolution. The sizesof the full window and any MBWs can be chosen by the user according tohis/her user preferences. The system can include a user interface thatcan display many MBWs, and each MBW can display a specific video chapterof the full length video. The user interface can also allow the user toset his/her user preferences, for example, number of MBWs, size of eachMBW, start time or duration of each video chapter, assignment ofchapters to MBWs, and so forth.

The term “codec” is equally used herein to describe techniques forencoding and decoding, and for implementations of these techniques. Anencoder converts input media data into a bitstream or a packet stream,and a decoder converts an input bitstream or packet stream into a mediarepresentation suitable for presentation to a user, for example digitalor analog video ready for presentation through a monitor, or digital oranalog audio ready for presentation through loudspeakers. A transcoderconverts an input bitstream or packet stream compressed using acompression technique into its original media representation suitablefor presentation to a user and then re-converts into an input bitstreamor packet stream using another type of compression technique. Encodersand decoders can be dedicated hardware devices or building blocks of asoftware-based implementation running on a general purpose CPU.

Set-top-boxes and personal computers (PCs) can be built such that manyencoders or decoders may run in parallel or quasi-parallel. For hardwareencoders or decoders, one way to support multiple encoders/decoders isto integrate multiple of their instances in the set-top-box or PC. Forsoftware implementations, similar mechanisms can be employed.

Traditional video codecs used in video distribution systems provide onlya single bit stream at a given bitrate, and no layers. As explainedabove, when a lower temporal or spatial resolution is required from afull length video file (such as for fast forwarding or for display at asmaller spatial size in a MBW), first, the full resolution file must bedecoded to regenerate the raw (uncompressed) video, which then needs tobe sub-sampled in temporal and/or spatial dimension, as the case may be,to produce a lower spatio-temporal resolution appropriate for the MBW.This process wastes significant bandwidth (if the full length video fileis in a remote location and needs to be transported over a network),time, and computational resources. However, support for lowerresolutions is beneficial in the video skimmer to enable display of manyvideo chapters simultaneously, and without consuming processing time andpower to generate them. The network bandwidth required to transportvideo for many MBWs may also be advantageously minimized.

In one embodiment, a skimmer may support “spatial skimming”. A fulllength video file available in a layered encoded format may readilycarry a low resolution version of the actual video content, which mayfit into MBWs of the video skimmer system without further spatialsub-sampling after decoding. The skimmer may simultaneously display morethan one MBW showing more than one chapter. The user may enlarge thevideo of a chapter by clicking on the MBW once he/she identifies thescene of interest in an MBW. As a result, the skimmer can request andreceive information that enables the skimmer to present to the user ahigh resolution version of the video content, as disclosed in theco-pending U.S. patent application entitled “Systems, Methods andComputer Readable Media for Instant Multi-Channel Video Content Browsingin digital Video Distribution Systems”, concurrently filed herewith.

In the same or another embodiment, a video skimmer can support temporalskimming. A full length video file available in a layered encoded formatmay readily carry a temporally sub-sampled lower layer. The skimmer maydisregard the timing information in the lower layer and present thevideo as fast forward video. For example, if the full length video wereoriginally available at 30 fps, and the temporally sub-sampled lowerlayer is available at 10 fps, the skimmer may display the 10 fps lowerlayer at 30 fps, thereby speeding up playback at a factor of 3. Once theuser clicks on the MBW presenting the fast forward video, the skimmermay display the MBW's content in original speed (in the example, byslowing down playback speed to 10 fps). It may further request andreceive temporal scalable enhancement layers that enable full temporalresolution of the MBW's content.

The video skimming advantageously uses a lower resolution (spatialand/or temporal) version of the video content from several videochapters to fit into more than one MBW. The user may view several MBWssimultaneously, and may assign specific video chapters to these MBWs.The video chapters may be generated by any of the options discussedbefore.

In the same or another embodiment, video chapters may be the result ofsubdividing the full length video into video chapters of a given length.For example, video chapters may be assigned 10 minutes intervals of thefirst 40 minutes of the full length video sequentially, and those videochapters may be displayed in 4 MBWs.

In the same or another embodiment, the user may decide to switch theassignment of video chapter to MBWs during the skimming process (e.g.,switch to assign every 10 minutes of the next 40 minutes of the videosequentially to 4 MBWs). An exemplary user interface with 4 MBWsskimming a 40 minutes full-length video in only 10 minutes is shown inFIG. 3. The first MBW (301) on the screen (305) displays the first 10minutes of the full length video. The second and third MBWs (302) (303)on the left side of the screen display minutes 10-20 and 20-30 of thefull length video, respectively. As disclosed in the co-pending U.S.patent application, entitled “Systems, Methods and Computer ReadableMedia for Instant Multi-Channel Video Content Browsing in digital VideoDistribution Systems”, concurrently filed herewith, MBWs can be ofdifferent shapes and sizes. Accordingly, the right MBW, which displaysminutes 30-40 of the full length video, is twice the size of the otherthree MBWs. A person skilled in the art can easily construct otherscreen layouts with more or less MBWs in different sizes, coveringdifferent parts of the full length video.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 4 shows a standalone video skimmer (401) with an attached display(402). The video skimmer can receive video content from a variety ofsources: live feed video content, for example from a camera (403)connected to the video skimmer through interface (404); video contentfrom a DVD (405) attached to the video skimmer through interface (406);or even in the form of a full length video file from an external and/orremote video database (407). The external remote video database can belocated on the Internet (409) or other suitable networks, using networkinterfaces (410, 411). The MBWs are presented on display 402 (asdepicted in FIG. 3). The video skimmer logic is part of the videoskimmer (401). The video skimmer can be implemented based on a generalpurpose computer, e.g., a PC, a standalone computer or some other typeof hardware, such as a set-top box in IPTV environment where the set-topbox may be attached to a suitable network (409) such as the Internet.

In case a video file is retrieved from a remote database, one of thewell known file transfer protocols such as FTP (RFC959 available athttp://www.faqs.org/rfcs/rfc959.html) may be used to transmit video overthe Internet (409) (or any other suitable network) over links (410,411).

A set-top box can be hardware for the video skimmer (401). A TVconnected to the set-top-box can be used as the display (402). Theset-top-box translates the data received from the network (409) into asignal format the TV understands; traditionally, a combination of analogaudio and video signals are used, but recently also all digitalinterfaces (such as HDMI) have become common. The set-top-box (on the TVside), therefore typically includes analog or digital audio/videooutputs and interfaces.

Internally, a set-top-boxes can have a hardware architecture similar toa general purpose computer: a central processing unit (CPU) executesinstructions stored in Random Access Memory (RAM) or read-only-memory(ROM), and utilizes interface hardware to connect to the networkinterface and to the audio/video output interface, as well as aninterface to a form of user control (often in the form of a TV remotecontrol, computer mouse, keyboard, or other input device), all under thecontrol of the CPU. A set-top-box may also include one or moreaccelerator units (for example dedicated Digital Signal Processors, DSP)that may help the CPU with computationally complex tasks of videodecoding and video processing, among others. Those units are typicallypresent for reasons of cost efficiency, rather than for technicalnecessity.

General purpose computers can often be configured to act like aset-top-box. In some cases, additional hardware needs to be added to thegeneral purpose computer to provide the interfaces a typical set-top-boxcontains, and/or additional accelerator hardware to augment the CPU forvideo decoding and processing.

The programmable parts of set-top-boxes, PCs, and other devices suitableas the basis of a video skimmer may require instructions, which may besupplied by a computer readable media (408).

The set-top-box or general purpose computer may run under an operatingsystem such as Windows. The video skimmer is advantageously using anoperating system that allows the simultaneous display of more than onemotion video in on-screen windows.

Referring to FIG. 5, the internal architecture of a standalone videoskimmer is now described. There are several options for video inputs tothe video skimmer, as follows.

(a) Live video content may be fed from the camera (501), which attachesto a live video interface (502) through connection (503). The videointerface may connect to frame capture (504). The frame capture maygenerate video frames that may feed into a layered encoder (505).

(b) Alternatively, the video may be obtained as a file download from aremote video database (506) attached to a network, such as the Internet(507), in which case the file download may be received through networkconnection and a file download interface (508). If the video database(506) delivers the video in an appropriate layered format then it can beprocessed directly (509). Otherwise the video may be transcoded into alayered encoded format by a transcoder (510).

(c) In the same or another embodiment, video content (in uncompressedformat, or in a compressed format that may not be a layered format) maybe received from a DVD or a similar storage medium such as a memorystick (511), in which case it is received on a digital video storageinterface (512). If the received video is not in layered encoded format,then they may first be transcoded in transcoder (513) into layeredencoded format; otherwise, it may be processed directly (514).

Regardless of the input mechanism as described above, the video may besent as a full length video file in a layered encoding format in thelocal video database (515).

A video extractor (516) may be responsible to retrieve a video bitstream from video database (515) according to control commands it mayreceive from a skimming controller (517). The skimming control logic(517) may, for example, indicate the MBW size and the beginning andending time markers of the video for that MBW to video extractor, which,in response to the indication, may find the corresponding layeredencoded bitstream in the database.

The extracted video may be displayed in the MBW on display (519) afterdecoding by a layered decoder (518). A user interface (520) may sendappropriate display commands to properly display the video. The userinput commands may be received through a user interface device (521)(e.g., a keyboard, mouse or remote control device), which may betranslated into proper display commands through user interface (520) andmay be displayed on the display (519). The commands may be sent by theuser interface (520) to the skimming control logic (517) for furtherprocessing. Such user commands can, for example, include: (a) selectionof size and display location of a MBW, (b) a click or double click on aMBW that may result in a request to receive corresponding audio or fullresolution video, (c) entering user desired skimming parameters such asindex markers for video chapter, etc.

The skimmer control logic (517) may receive user commands and maytranslate them into appropriate actions for the video extractor (516),thereby enabling the video extractor (516) to extract only those videobits required for proper display.

An alternative implementation of a video skimmer may follow aclient-server architecture as illustrated in FIG. 6. The video skimmermay be divided into two components: the video skimmer client (601) andthe video skimming server (602). The video skimmer client (601) anddisplay (603) are located at the user's premises. The video skimmerclient (601) may be connected to the video skimming server (602) over asuitable network, such as the Internet (604). The video skimming server(602) may reside at any suitable location in the network and notnecessarily in the user's premises. A local video database (605) may beadvantageously placed co-located with the video skimming server (602).

The video skimming server may also serve non-co-located, but networkattached databases, such as a remote video database (606), via asuitable network such as the Internet (604). However, if the video fileis located in a remote video database, that video file mayadvantageously be downloaded to local video database (605) in itsentirety before starting the skimming process, because the videoextraction logic for skimming resides within the video skimming server(602).

A single video skimming server may serve many remote video databases andmany video skimmer clients simultaneously. With the separation of theclient and server, the video skimming service (server) may become abusiness for a service provider which can offer it to many subscribers,each subscriber deploying the client component for skimming.

Although, for simplicity, only a single client and a single server areshown in FIG. 6, the present invention also envisions distributedarchitectures of the skimming server. Similarly, one or more videoskimming clients can run simultaneously on a single client computer.

The video skimmer server may be responsible for extracting theuser-requested video file from a local video database, and may sendlayered encoded video chapters according to a user's requests across theInternet to the client. Note that the video chapters displayed in theMBWs may be sent using only those layers required for proper decodingand display in a MBW without spatial or temporal sub sampling, therebysignificantly reducing the network bandwidth, compared to thetransmission of all layers beneficial for decoding and display of thevideo in a main window (at full resolution).

In FIG. 7, the detailed architecture of the video skimming server (701)is shown. The video skimming server (701) can include, for example, askimming control logic server (SCLS) (702) which can communicate withthe corresponding skimming control logic client (SCLC) (704) in thevideo skimming client (703). The SCLC can, for example, specify the MBWlayout (e.g., each MBW location and size, or number of MBWs) at theuser's endpoint, and video chapter definitions when the video chaptersare not indexed (e.g., video skimming start time, and/or video chapterlengths, or any other information that allows for suitable indexing). Ifthe full length video is already indexed, then the server can have thecapability to send indexed video chapters, and can also send themetadata associated with each indexed video chapter. If a skimmedversion of the full length video is available at the server, then theserver can have the capability to send the skims and the meta dataassociated with each skim.

The SCLS (702) can serve, and can be controlled by, many SCLCs (704)simultaneously.

One purpose of the SCLC (704) is to translate user input into a protocolformat that the SCLS (702) understands. For example, when the userclicks on an MBW to view the content to a high-resolution video displayin the main window, the SCLC (704) can send a request to the SCLS (702)to start sending all enhancement layers for decoding and display in themain window in high resolution, in addition to the layers for decodingand display in MBWs (MBW layers), for the video chapter associated withthe MBW user clicks. Meanwhile, the server can continue to send the MBWlayers of the video chapters being decoded and displayed in the otherMBWs.

The skimming video may advantageously reside in a local video database(705). If the video file is in a remote database (706), then the filecan be retrieved and placed in the local video database (705). The filein the remote database (706) can either be in suitable layered encodedformat, in which case it can be placed directly deposited to the localvideo database (705). If the video file is encoded using another type ofcompression technology (including, for example, possibly loss-lesscompression or an uncompressed format), it can first be transcoded intranscoder (707), and then it can be deposited to the local videodatabase (705).

The video skimmer server may reside in one or more computers. The videoskimming client may reside in a PC, a standalone general purposecomputer, or it may be an IPTV set-top box.

If the user endpoint is a set-top-box, it can use, for example, a TV asthe display (708) to display the MBWs.

User commands for video skimming may be received through a user inputdevice (709) (e.g., mouse, keyboard or remote control), which can betranslated into information displayed on display (708) after the userinterface (710). When a user selects a video file for skimming and theskimming configuration (e.g., number of MBWs, assignment of videochapters to MBWs), user interface (710) can send these requests to SCLC(704) which, in turn, may send appropriate skimming messages to the SCLS(702). The SCLS (702) can instruct video extractor (711) to extractappropriate layers of the encoded video that can be stored in the localvideo database (705). The video extractor (711) can extract thebitstreams and send them to the streaming server (712). The streamingserver can send streamed bitstreams using protocols such as RTP (RFC3550, available from http://www.faqs.org/rfcs/rfc3550.html). On theclient side, a streaming client (713) can receive the RTP packets,extract the bitstreams, and send the bitstreams to layered decoder (714)which can decode the bitstreams into raw format ready for display. UserInterface (710) can collaborate with the SCLC (704) to assign receivedbitstreams from the layered decoder (714) to appropriate MBW on theuser's display (708).

The SCLC (704) can communicate with its server component SCLS (702) tospecify:

a. set or change MBW configuration (e.g., number or alignment of MBWs,size of MBWs, location of MBW windows on the display)

b. select video chapters (e.g., un-indexed, indexed, or skimmed)

c. configure video chapters (e.g., video chapter start times, lengths,chapter location in video file, mapping to MBWs, etc.)

d. MBW video controls (e.g., dump content to main window in higherresolution; receive audio, pause/restart/stop video).

In summary, the video skimming client (703) can include, for example,the following functionalities, although others could be added:

a. Receive and process streamed media;

b. Decode streams of video chapters for display;

c. Display video chapters in MBWs;

d. Control skimming logic (e.g., user selection of MBWs, video chapters,start-stop type video controls, and other controls); and/or

e. Receive user commands from user interface devices.

In summary, video skimmer server (701) can include, for example, thefollowing functionalities, although others could be added:

a. Encode and Transcode remote video for storage in local videodatabase;

b. Extract appropriate bitstreams from the video database for display inan MBW

c. Receive, process and react to commands from skimming control logicclient

d. Stream and send video towards client applications.

As shown in FIG. 8, in the same or another embodiment, certain useractivities, recognized by the user interface, can lead to certainactions of the skimming control logic—either the local SCLC or the logicthat is distributed between SCLS and SCLC. The user activities can belisted in the form of a menu structure. The top level menu can beinvoked by pressing (821) a “Menu” button or by a similar user inputactivity.

On its top level menu, the user interface can offer different pull-downmenus to input various user preferences, for example, to: save usersettings (801), restore user settings (802), select all MBWs (803),select one MBW (804), select skimming mode (805), and/or cancel (806).By selecting to cancel (806), the user closes the top level menu and allsub-menus that may be open without further interference in the state ofthe system. If the user selects any of the other selections, he/she ispresented with a sub-menu as follows:

If the user selects to save user default settings (801), he/she ispresented with certain related sub-menu choices, for example, to: saveMBW configuration (808), save default skimming configuration (807),and/or cancel (809). Electing to cancel (809) closes the sub-menuwithout any change in state, and returns to the main menu. Selecting tosave default skimming configuration (807) saves, possibly after aconfirmation, the current skimming configuration as a default. Theskimming configuration can include aspects such as the length of eachchapter (e.g., select a uniformly assigned length such as 10 minutes,select a length that is determined by hints or context—e.g., based onmetadata that may be included in the full length video indicatingdifferent scenes, or any other suitable length determination method).Selecting to save default MBW configuration (808) saves the current MBWconfiguration as a default. The MBW configuration is being determinedthrough the two main menu items discussed next.

When the user selects the main menu item to restore user settings (802),once selected, restores the previously saved user settings, as storedusing the menu item for saving user settings (801).

If the user chooses to select all MBWs (803), he/she can set propertiesrelated to all MBWs currently being displayed. Sub-menu items caninclude, for example, to: close all MBWs (810), arrange all MBWs onscreen (811) (which distributes the MBWs evenly over the availablescreen area), to resize all MBWs (812) (which allows the user to set thesize of all MBWs), and to cancel (813).

If the user chooses to select one MBW (804), he/she first selects theMBW to which the subsequent changes apply. Alternatively, or inaddition, this menu item can also advantageously be implemented as acontext sensitive menu. For example, right-clicking on an MBW triggersthis submenu without requiring the user to go through the main menu. Thesub-menus can offer different related actions, for example, to:

Map content to the main screen (814) (which closes the skimming userinterface and shows the chapter assigned to the MBW in full screenresolution),

Change MBW size (815) (which allows to set the size of the MBW withoutchanging the size of other MBWs),

Assign chapter (816) (which allows the user to assign a chapter of thefull length video to the selected MBW),

Move MBW (817) (which allows the user to select the MB W′s spatialposition on the screen without affecting the positions of other MBWs,

Close MBW (818), to pause/start/stop video in MBW (819), and/or

Cancel (820).

The main menu item to select skimming mode (805) provides the followingrelated options for skimming mode selection, which may include, forexample, fixed interval (822), scene detection (823), hint track (824),and/or Cancel (825).

If the user selects the fixed interval sub-menu option (822), the usercan select the length of the chapters by providing either an interval(in, for example, seconds, minutes, and/or hours), or by selecting thenumber of equally long chapters the full length video shall be dividedinto by the video skimmer.

If the user selects the scene detection sub-menu option (823) the videoskimmer is instructed to assign each scene, as determined by a scenedetection algorithm, to an MBW, up to a user-selectable maximum numberof MBWs.

The hint track sub-menu option (824) determines the association ofchapters to MBWs using a hint track that may be present in thefull-length video; if the hint track is not present, this option may begrayed out.

The cancel sub-menu option (825) leaves the sub-menu without a statechange.

FIG. 9 shows an exemplary message flow between the skimming controllogic client (SCLC) (901) and the skimming control logic server (SCLS)(902), using a protocol such as HTTP, or employing other standard-basedor proprietary protocol. The SCLC can request a specific video chaptermapping to an MBW for an already selected MBW configuration through avideo chapter assignment request message (903). This message can containinformation about the user (e.g., Client ID), the video file (e.g., FileID or file title), the MBW (e.g., dimensions), and/or the begin and endtime markers of the requested video chapter.

If the request is valid (904), then the SCLS (902) can return a videochapter assignment response message (905) to the SCLC (901) indicatingthat the action is accepted, in which case the SCLS instructs (906) thevideo extractor to fetch defined bitstream for local database. If therequest can't be implemented, then the SCLS returns a video chapterassignment response message (907) indicating that the action is notvalid.

1. A system for simultaneous display of a plurality chapters of a fulllength video file, wherein the full length video file comprises video ina layered coding format.
 2. The system of claim 1, wherein the fulllength video file comprises: a first layer representing a first spatialresolution; and at least one enhancement layer based on the first layerrepresenting at least one second spatial resolution.
 3. The system ofclaim 2, wherein the first layer comprises a base layer.
 4. The systemof claim 1, wherein the full length video file comprises: a first layerrepresenting a first temporal resolution; and at least one enhancementlayer based on the first layer representing at least one second temporalresolution.
 5. The system of claim 4, wherein the first layer comprisesa base layer.
 6. The system of claim 1, wherein the full length videofile comprises one or more video files located in a local database. 7.The system of claim 1, wherein the full length video file comprisescontent received from a live feed.
 8. The system of claim 1, wherein thefull length video file comprises content received from a digital videostorage interface.
 9. The system of claim 1, wherein the full lengthvideo file comprises content received from a remote video database. 10.The system of claim 1, further comprising an input, coupled to thedisplay system, adapted for permitting a user to control thesimultaneous display of a plurality of chapters through at least oneuser preference.
 11. The system of claim 10, wherein the at least oneuser preference comprises a preference to specify a number of windows.12. The system of claim 10, wherein the at least one user preferencecomprises a preference to specify a duration for a video chapter. 13.The system of claim 10, wherein the at least one user preferencecomprises a preference to specify a start time for a video chapter. 14.The system of claim 10, wherein the at least one user preferencecomprises a preference to assign at least one chapter to at least onewindow.
 15. The system of claim 1, further comprising one or moreskimming control logic modules configured to control at least one aspectof a display system.
 16. The system of claim 10, further comprising oneor more skimming control logic modules configured to control at leastone aspect of a display system.
 17. The system of claim 16, wherein theskimming control logic further comprises logic adapted to translate oneor more user preferences into video extraction commands.
 18. A method ofvideo skimming, comprising: a. extracting a plurality of chapters of afull length video file, wherein the chapters are coded in a layeredbitstream format, b. decoding the plurality of chapters, and c.simultaneously displaying each of the decoded plurality of chapters. 19.The method of claim 18 wherein the layered bitstream format comprises afirst layer representing a first spatial resolution and at least oneenhancement layer based on the first layer representing at least onesecond spatial resolution, the decoding further comprises: a. decodingthe first layer of the plurality of chapters; and b. decoding at leastone of the enhancement layers of the plurality of chapters.
 20. Themethod of claim 18, further comprising extracting the full length videofile from a local database.
 21. The method of claim 18, furthercomprising receiving a full length video file content from a live feed.22. The method of claim 18, further comprising receiving the full lengthvideo file content from a digital video storage interface.
 23. Themethod of claim 18, further comprising receiving the full-length videofile from a remote video database.
 24. The method of claim 18, whereinthe method further comprises receiving a user input to control thesimultaneous display of a plurality of chapters through at least oneuser preference.
 25. The method of claim 24, wherein the at least oneuser preference comprises a preference to specify a number of windows.26. The method of claim 24, wherein the at least one user preferencecomprises a preference to specify a duration for a video chapter. 27.The method of claim 24, wherein the at least one user preferencecomprises a preference to specify a start time for a video chapter. 28.The method of claim 24, wherein the at least one user preferencecomprises a preference to assign at least one chapter to at least onewindow.
 29. The method of claim 24, further comprising translating theat least one user preference into at least one video extraction command.30. A computer readable media having computer executable instructionsincluded thereon for performing a method of video skimming, comprising:a. extracting a plurality of chapters of a full length video file,wherein the chapters are coded in a layered bitstream format, and b.decoding the plurality of chapters, and c. simultaneously displayingeach of the decoded plurality of chapters.
 31. The computer readablemedia of claim 30, wherein the layered bitstream format comprises afirst layer representing a first spatial resolution and at least oneenhancement layer based on the first layer representing at least onesecond spatial resolution, the decoding further comprises: a. decodingthe first layer of the plurality of chapters; and b. decoding at leastone of the enhancement layers of the plurality of chapters.
 32. Thecomputer readable media of claim 30, wherein the method furthercomprises extracting the full length video file from a local database.33. The computer readable media of claim 30, wherein the method furthercomprises receiving a full length video file content from a live feed.34. The computer readable media of claim 30, wherein the method furthercomprises receiving the full length video file content from a digitalvideo storage interface.
 35. The computer readable media of claim 30,wherein the method further comprises receiving the full-length videofile from a remote video database.
 36. The computer readable media ofclaim 30, wherein the method further comprises receiving a user input tocontrol the simultaneous display of a plurality of chapters through atleast one user preference.
 37. The computer readable media of claim 36,wherein the at least one user preference comprises a preference tospecify a number of windows.
 38. The computer readable media of claim36, wherein the at least one user preference comprises a preference tospecify a duration for a video chapter.
 39. The computer readable mediaof claim 36, wherein the at least one user preference comprises apreference to specify a start time for a video chapter.
 40. The computerreadable media of claim 36, wherein the at least one user preferencecomprises a preference to assign at least one chapter to at least onewindow.
 41. The computer readable media of claim 36, wherein the methodfurther comprises translating the at least one user preference into atleast one video extraction command.
 42. A video skimming server forpreparing and distributing a plurality of chapters of a full lengthvideo file, wherein the full length video file comprises video coded ina layered bitstream format, comprising: (a) a video extractor forextracting a plurality of encoded audio-visual signals from a fulllength video file; (b) a streaming server for distributing the extractedaudio-visual signals; and (c) skimming control logic server, coupled tothe video extractor, adapted for receiving at least one control messagefrom a video skimmer client and instructing the video extractor toextract the audio-visual signals.
 43. The video skimming server of claim42, wherein the full length video file comprises one or more video fileslocated in a local video database.
 44. The video skimming server ofclaim 42, wherein the full length video file comprises content receivedfrom a live feed.
 45. The video skimming server of claim 42, wherein thefull length video file comprises content received from a digital storageinterface.
 46. The video skimming server of claim 42, wherein the fulllength video file comprises content received from a remote database. 47.The video skimming server of claim 46, further comprising a transcoder,coupled to the video extractor and to the remote database, fortranscoding the content received from the remote database.
 48. The videoskimming server of claim 42, wherein the video extractor comprises atleast a portion of a distributed server.
 49. The video skimming serverof claim 42, wherein the skimming control logic server comprises atleast a portion of a distributed server.
 50. A video skimmer client forpreparing a customized display of a plurality of chapters of a fulllength video file, wherein the full length video file is coded in alayered bitstream format, comprising: (a) at least one streaming clientmodule configured to receive a plurality of chapters, wherein thechapters are in a layered bitstream format; (b) one or more decodersconfigured to decode the chapters; (c) a graphical user interface forreceiving user input, wherein the graphical user interface is accessedthrough a video display; and (d) a skimmer control logic client forsending at least one control message to a video skimmer server.
 51. Thevideo skimmer client of claim 50, wherein said video display comprises atelevision.
 52. The video skimmer client of claim 50, wherein said videodisplay comprises a computer monitor.
 53. The video skimmer client ofclaim 50, wherein said video skimmer client comprises at least a portionof a television.
 54. The video skimmer client of claim 50, wherein saidreceiver comprises at least a portion of a general purpose computer. 55.The video skimmer client of claim 50, wherein said video skimmer clientcomprises at least a portion of a set-top box.
 56. The video skimmerclient of claim 50, wherein said video skimmer client comprises at leasta portion of a gaming console.